6 research outputs found

    Multi-Word Terminology Extraction and Its Role in Document Embedding

    Get PDF
    Automated terminology extraction is a crucial task in natural language processing and ontology construction. Termhood can be inferred using linguistic and statistic techniques. This thesis focuses on the statistic methods. Inspired by feature selection techniques in documents classification, we experiment with a variety of metrics including PMI (point-wise mutual information), MI (mutual information), and Chi-squared. We find that PMI is in favour of identifying top keywords in a domain, but Chi-squared can recognize more keywords overall. Based on this observation, we propose a hybrid approach, called HMI, that combines the best of PMI and Chi-squared. HMI outperforms both PMI and Chi-squared. The result is verified by comparing overlapping between the extracted keywords and the author-identified keywords in arXiv data. When the corpora are computer science and physics papers, the top-100 hit rate can reach 0.96 for HMI. We also demonstrate that terminologies can improve documents embeddings. In this experiment, we treat machine-identified multi-word terminologies with one word. Then we use the transformed text as input for the document embedding. Compared with the representations learnt from unigrams only, we observe a performance improvement over 9.41% for F1 score in arXiv data on document classification tasks

    Impact of Aspect Oriented Programming on Software Development Quality Metrics

    Get PDF
    The aspect-oriented programming (AOP) is a new paradigm for improving the system’s features such as modularity, readability and maintainability. Owing to a better modularisation of cross-cutting concerns, the developed system implementation would be less complex, and more readable.Thus, software development efficiency would increase, so the system would be created faster than its object-oriented programming (OOP) equivalent. In this paper, we provide some insight into the OO software development quality metrics were significantly associated with using AOP.The method that we are currently studying is based on a popular C & K metrics suite that extends the metrics traditionally used with the OO paradigm and also extend to AO paradigm. We argue that a shift similar to the one leading to the Chidamber and Kemerer’s metrics is necessary when moving from OO to AOP software
    corecore